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The  Social  Science  Research  Institute  of  the  University  of  Southern 
California  was  founded  on  July  I.  l‘»7J  to  permit  USC  scientists  to 
briny;  their  scientific'  and  technological  skills  to  !<ear  on  social  and  ptiblic 
policy  problems.  Its  staff  members  include  faculty  and  graduate  students 
tiom  many  of  the  Departments  anti  Schools  ot  the  University. 

SSRI’s  research  activities,  siipportetl  in  pa't  from  l niversity  funds 
and  in  part  by  various  sponsors  range  from  rxttensely  baste  to  lelatixely 
applied.  Most  SSRI  projects  mix  both  kinds  of  goals  - that  is,  they  con- 
tribute to  fundamental  knowledge  io  the  field  of  a social  problem,  and  in 
doing  so.  help  to  cope  with  that  problem,  l ypically.  SSRI  programs  are 
interdisciplinary,  draw  ing  not  only  on  its  own  staff  but  on  the  talents  of 
others  within  the  USC  community,  latch  continuing  program  is  composed 
of  several  projects;  these  change  from  time  to  time  depending  oil  staff 
and  sponsor  interest. 

At  present  (Spring,  l‘>75).  SSRI  has  four  programs: 

Criminal  justice  anil  juvenile  deHm/ucney.  I epical  projects  ini  lode 
studies  of  the  effect  ot  diversion  on  recidivism  among  Los  Angeles  area 
juvenile  deliuipients,  and  evaluation  of  the  effects  of  decriminalization 
of  status  offenders. 

Derision  analysis  ami  social  program  evaluation . 1 ypical  projects 
include  study  of  elicitation  methods  lor  continuous  probability  distribu- 
tions and  development  of  an  evaluation  technology  for  California  Coastal 
Commission  decision-making. 

Program  for  data  research.  A typical  project  is  examination  of 
small-area  crime  statistics  for  planning  and  evaluation  of  innovations  in 
California  crime  prevention  programs. 

Models  for  social  phenoim  na.  I'ypical  projects  include  differential- 
equation  models  of  international  relations  transactions  and  models  of 
population  Hows. 


SSRI  anticipates  continuing  these  four  programs  and  adding  new 
staff  and  new  programs  from  time  to  time.  For  further  information,  pub- 
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Eliciting  Subjective  Probability  Distributions  on 
Continuous  Variables 

David  A.  Seaver,  Detlof  v.  Winterfeldt,  and  Ward  Edwards 

Probabilities  are  orderly  numerical  representations  of  personal  opinions 
about  possible  events  (Savage,  1954;  see  also  Edwards,  Lindman,  and  Saviige, 
1963).  Such  opinions  must  be  communicated  in  order  to  be  used;  the  process, 
important  for  many  practical  purposes,  of  requesting  someone  to  communicate 
such  numbers  is  called  elicitation.  Unfortunately,  the  truism  of  psycho- 
physics that  the  same  question,  asked  in  two  different  though  formally  equi- 
valent ways,  will  lead  to  different  answers  applies  to  judgments  of  uncertainty 
as  it  does  to  all  other  judgments. 

If  different  elicitation  procedures  produce  different  numbers,  which 
procedure  and  numbers  should  we  believe  and  use?  At  a very  abstract  and  philo- 
sophical level,  the  question  is  unanswerable;  probabilities  are  judgments  made 
by  unique  individuals  about  unique  events,  and  so  cannot  be  right,  or  wrong,  or 
better,  or  worse.  More  practically,  we  can  Identify  five  properties  that  we 
should  like  individual  probability  estimates  or  ensembles  of  such  estimates  to 
have.  Presumably  the  better  estimates  are  on  these  five  criteria,  the  more 
faith  we  will  have  in  their  validity. 

1.  Estimates  should  obey  the  usual  laws  of  probability.  In  particular, 
the  probabilities  of  an  exhaustive  set  of  mutually  exclusive  events  should 
sum  to  1,  tnd  probabilities  of  Independent  events  should  multiply. 

2.  Probabilities  should  be  extreme.  If  elicitation  method  A assign*; 
p = .60  to  an  event,  while  elicitation  method  B assigns  p = .80,  then  A did 
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worse  than  B if  the  event  later  happens,  and  better  if  It  does  not.  Murphy 
and  Winkler  (1968)  call  this  property  primary  validity. 

3.  Probability  distributions,  taken  over  an  ensemble  of  events,  should 
yield  relative  frequencies  close  to  the  relative  frequencies  estimated  for 
them.  For  the  discrete  case,  for  example,  all  events  assigned  probability  .60 
should  have  In  common  the  property  that  about  60%  of  them  occur.  For  the  con- 
tinuous case,  a 90%  credible  Interval  over  a continuous  variable  should  have 
the  property  that  about  90%  of  the  true  values  of  that  variable  fall  within 
that  Interval.  Murphy  and  Winkler  call  this  property  secondary  validity. 

Note  that  In  practice  properties  2 and  3 can  conflict.  A good  way  of  satis- 
fying property  3 for  predictions  of  rainfall,  for  example,  would  be  to  deter- 
mine last  year's  relative  frequency  of  rainy  days  and  use  that  number  as  the 
estimated  probability  of  rain  every  day  this  year.  Obviously  such  a procedure, 
while  It  would  do  well  with  respect  to  property  3,  would  be  very  poor  with 
respect  to  property  2. 

4.  Scores  calculated  from  what  are  called  proper  or  reproducing  scoring 
rules  (see  Toda,  1963,  Aczel  and  Pfanzagl,  1966)  in  effect  combine  properties 
2 and  3.  Such  rules  have  the  property  that  the  expected  value  of  the  score  Is 
maximized  if  and  only  If  the  estimator  correctly  reports  his  true  opinion. 

5.  Responsiveness  to  evidence  should  characterize  good  probability 
assessments.  This  criterion  Is  difficult  to  state  precisely.  A rough  state- 
ment would  be  that  probabilities  should  be  modified  by  evidence  in  a manner 
specified  by  Bayes's  Theorem.  Strictly  speaking,  this  Is  simply  criterion  1 
restated,  since  Bayes's  Theorem  is  (like  virtually  all  other  combination  rules 
for  probability)  a direct  consequence  of  the  fact  that  probabilities  sum  to 
one  and  that  independent  probabilities  multiply. 
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Much  more  vaguely  interpreted,  property  5 means  that  probability  esti- 
mates should  be  reasonable— the  meaning  of  that  word  In  this  context  is  much 
the  same  as  its  meaning  in  law. 

Experimental  work  has  been  done  bearing  on  all  five  criteria.  Subjec- 
tive probability  distributions  assessed  by  different  techniques  are  Inconsis- 
tent with  each  other  (Schaefer  and  Borcherdlng,  1973:,  Stael  von  Holstein,  1971; 
Winkler,  1967).  When  assessed  probabilities  have  been  evaluated  in  terms  of 
criterion  3 (secondary  validity)  (Alpert  and  Ralffa,  1969;  Brown,  1973;  Schae- 
fer and  Borcherdlng,  1973),  the  typical  finding  has  been  that  they  do  not  agree 
very  well  with  the  relative  frequency  of  the  actual  events.  Training  improved 
the  validity  somewhat,  but  not  as  much  as  desired.  Specifically,  these  studies 
found  subjective  probability  distributions  over  continuous  quantities  to  be 
much  too  tight  when  using  fractlle  and  equivalent  assessment  procedures  (see 
Winkler,  1967,  for  a complete  description  of  these  procedures).  That  is,  an 
unduly  large  percentage  of  the  events  fell  Into  the  extreme  tails  of  the 
assessed  distributions. 

It  Is  possible  that  these  results  are  an  artifact  of  the  assessment  pro- 
cedures used,  particularly  the  fractlle  procedure.  In  typical  procedures  sub- 
jects are  asked  to  state  a value  of  a random  variable  such  that  with  probability 
p the  true  value  will  fall  below  that  value,  with  probability  1-p  above. 

Tversky  and  Kahnemann  (1973)  suggest  that  in  judgments  of  this  type,  a cog- 
nitive process  called  anchoring  and  adjustment  may  occur.  They  hypothesize 
that  when  a subject  is  asked  for  values  corresponding  to  specific  fractiles, 
the  subject  first  "anchors"  on  the  value  considered  most  likely,  and  then 
"adjusts"  that  value  in  the  direction  appropriate  for  the  given  fractile. 
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The  adjustment  process  will,  however,  usually  be  insufficient,  thus  leading 
to  too  tight  distributions.  Thus  for  a p = .25  partition,  the  subject  might 

l 

assess  the  number  appropriate  for  p = .50  and  then  reduce  ft  somewhat,  but 
not  enough.  Similarly,  according  to  this  argument,  if  the  subject  is  given  a 
value  of  the  random  variable  and  asked  for  the  probability  or  odds  that  the 
true  value  Is  below  the  given  value,  the  anchoring  point  will  be  1:1  odds  or 
probability  of  .50  and  Insufficient  adjustment  will  lead  to  too  flat  distri- 
butions. Tversky  and  Kahneman  present  some  empirical  evidence  that  this  is, 
in  fact,  what  occurs. 

Another  possible  factor  contributing  to  the  poor  validity  of  distribu- 
tions assessed  with  fractiles  Is  the  fact  that  most  experimenters  phrase  their 
questions  In  terms  of  probabilities.  Results  from  another  task*1nvolving  un- 
certainty measures  as  responses,  probabilistic  Inference,  have  shown  that 
responses  In  odds  are  often  more  valid  (by  criterion  5)  than  probability  re- 
sponses (Phillips  and  Edwards,  1966).  Other  results  Indicate  that  in  some 
situations,  odds  on  a logarithmic  scale  may  be  even  more  valid  (see  Goodman, 
1973,  for  a review).  It  may  be  that  subjects  simply  do  not  really  understand 
the  meaning  of  very  large  or  very  small  probabilities.  In  addition,  the  cog- 
nitive adjustment  process  involved  in  the  assessment  task  may  very  well  depend 
on  the  measure  of  uncertainty  used  to  ask  or  answer  the  questions. 


This  study  Investigates  the  question  of  how  much  the  elicitation  tech- 
nique Influences  the  validity  (by  criteria  3 and  4)  of  the  assessed  probability 
distributions.  Several  elicitation  procedures  of  the  fractile  and  the  direct 
prot'blllty  estimation  type  are  applied,  and  several  uncertainty  measures  are 


used  to  investigate  the  effects  of  the  questioning  procedures  and  of  the 
numerical  expression  of  uncertainty  on  the  validity  of  assessed  distributions. 

Method 

Subjects.  The  Ss  were  41  upper  level  undergraduate  and  graduate  psy- 
chology students  at  California  State  University,  Long  Beach,  who  participated 
on  a voluntary  basis.  All  had  some  training  in  statistics  with  some  exposure 
to  the  Bayesian  approach. 

Stimuli.  Stimuli  were  almanac  questions  of  the  type  used  in  the  experi- 
ment by  Alpert  and  Raiffa  (1969).  For  example,  one  question  was:  "What  was 

the  population  of  Canada  in  1973?"  All  questions  involved  continuous  random 
variables.  Such  questions  are  convenient  for  research  because  the  experimenter 
knows  exact  answers,  while  subjects  have  relatively  vague  information  about 
them. 

A questionnaire  was  developed  for  each  assessment  procedure.  The  ques- 
tionnaires were  self-contained;  each  included  a complete  set  of  instructions, 
examples,  and  the  questions  necessary  to  assess  the  probability  distributions. 
Twenty  distributions  were  assessed  in  each  questionnaire;  ten  that  had  a per- 
centage as  the  variable,  e.g,,  the  percentage  of  the  population  of  California 
that  lived  in  Los  Angeles  County,  and  ten  that  had  absolute  numbers  as  vari- 
ables, e.g.,  the  population  of  Canada.  The  reason  for  including  two  types  of 
variables  was  that  the  percentages  represented  bounded  variables,  i.e.,  between 
0 and  100,  while  the  absolute  numbers  were  only  vaguely  bounded. 
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Assessment  Procedures.  Five  methods  for  assessing  subjective  probability 
distributions  on  continuous  variables  were  compared.  These  methods  varied  on 
two  dimensions:  the  measure  of  uncertainty  used  (odds,  odds  on  a logarithmic 
scale,  or  probability),  and  the  type  of  response  required  (uncertainty  measures 
or  values  of  the  variable).  A complete  crossing  of  these  variables  would  have 
yielded  six  experimental  groups.  But  the  use  of  odds  on  a logarithmic  scale 
as  stimulus  with  value  of  the  unknown  quantity  as  response  does  not  seem 
sufficiently  different  from  use  of  verbal  odds  as  stimulus  and  value  of  the 
unknown  quantity  as  response,  so  the  former  was  omitted. 

Elicitation  methods  requiring  values  of  the  unknown  quantity  as  responses 
used  questions  of  the  form  "What  is  the  number  of  people  such  that  your  odds 
are  3:1  that  the  true  population  of  Canada  is  less  than  that  number?"  (For 
probability  groups,  substitute  "probability  is  .75"  for  "odds  are  3:1";  for 
other  almanac  questions,  change  the  words  appropriately.)  Methods  requiring 
uncertainty  measures  as  responses  used  questions  of  the  form  "What  is  your 
probability  that  the  population  of  Canada  is  less  than  130,000,000  people?" 
for  the  probability  group  and  "Is  the  population  of  Canada  more  likely  to  be 
greater  than  or  less  than  130,000,000  people?"  and  "What  are  your  odds?"  for 
the  odds  groups.  The  verbal  odds  group  simply  wrote  their  odds  in  the  appro- 
priate blank  while  the  logarithmic  odds  group  marked  their  odds  on  a logarith- 
mically spaced  scale  of  odds  from  1:1  to  1000:1  with  a blank  for  odds  larger 
than  1000:1. 

This  paper  uses  the  abreviations  ODDS,  PR0B,  and  L0G0DDS  for  the  methods 
recuirlng  responses  of  odds,  probabil .ties,  and  odds  on  a logarithmic  scale 
respectively.  Procedures  requiring  values  of  the  variable  as  responses,  the 


commonly  used  fractiV  methods,  are  abbreviated  ODDSFRAC  and  PROBFRAC  for 
questions  phrased  In  odds  and  probabilities  respectively. 

For  the  ODDSFRAC  and  PROBFRAC  procedures  the  median,  two  quartiles  and 
the  .01  and  .99  fractlles  were  assessed  for  each  question.  For  the  ODDS.  PROB, 
and  LOGODDS  procedures,  five  values  of  the  variable  were  given  and  the  corres- 
ponding uncertainty  measures  were  assessed.  These  five  values  were  determined 
In  the  following  manner.  For  the  percentage  variables  they  were  randomly 
selected  for  each  question  from  a uniform  distribution  between  1 and  99.  For 
the  absolute  number  variables,  five  colleagues  were  asked  to  give  ranges  of  the 
variables  that  they  were  absolutely  certain  would  contain  the  true  value,  "he 
values  given  to  the  subjects  were  then  selected  randomly  from  a uniform  distri- 
bution between  the  minimum  and  maximum  values  given  by  the  five  colleagues. 
These  procedures  were  used  to  minimize  the  information  given  to  Ss  about  the 
range  of  the  variables.  Some  information  was  necessarily  transmitted  on  the 
questions  involving  absolute  number  variables  since  the  randomly  selectee 
values  were  all  In  some  sense  reasonable.  However,  the  randomly  selectee 
values  on  the  percentage  variables  did  not  any  information  to  the  already 
known  bounds. 

In  the  ODDSFRAC  and  PROBFRAC  procedures,  Ss  simply  wrote  in  the  vanes 
for  the  given  fractlles.  In  the  PROB  procedure  the  responses  required  we *e 
probabilities  that  the  true  value  was  less  than  the  given  values,  aga.n  s'niplv 
written  In  an  appropriate  blank.  The  '„,)$  and  LOGODDS  procedures  require, 
that  the  S_ first  state  if  the  true  value  was  more  likely  to  be  above  or  below 
the  given  value  and  then  how  much  more  likely,  by  writing  down  the  ecus  : he 

form  x:l  for  the  ODDS  group  or  by  marking  on  a logarithmic  scale  of  odds  >o 
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1:1  to  1000:1  with  a blank  to  fill  in  if  the  odds  were  larger  than  1000:1  for 
the  L0G0DDS  group. 

The  Ss  were  randomly  assigned  to  the  five  assessment  procedurfs,  with 
9,  9,  7,  8,  and  8 Ss  in  the  ODDS,  PROB,  LOGODDS,  ODDSFRAC,  and  PROBFRAC  groups 

respectively.  The  group  sizes  were  unequal  because  some  questionnaires  were 
not  returned. 


Results 

The  basic  data  analyses  compared  actual  relative  frequencies  with  those 
expected  from  perfectly  valid  and  unbiased  distributions.  Three  such  compari- 
sons were  made:  the  relative  frequency  of  true  values  falling  below  the  .01 
value  or  above  the  .99  value  of  the  cumulative  distributions,  called  "surprises"; 
the  relative  frequency  of  true  values  falling  within  the  interquartile  ranges; 
and  the  relative  frequency  of  true  values  falling  below  the  assessed  medians. 

For  the  ODDS,  PROB,  and  LOGODDS  procedures  on  some  occasions  it  was  not  possible 
to  determine  for  certain  whether  a true  value  fell  within  or  outside  the  rele- 
vant range.  For  example,  in  the  PROB  procedure  if  the  true  value  fell  between 
values  that  the  S had  assigned  probabilities  of  .10  and  .30,  it  was  not  possible 
to  determine  whether  the  true  value  was  within  or  outside  the  interquartile 
range.  The  relative  frequencies  for  such  cases  were  calculated  in  two  ways; 
both  by  excluding  such  occurrences  and  by  using  linear  interpolation  on  log 
odds  to  determine  the  location  of  the  true  value  on  the  cumulative  subjective 
distribution.  The  appropriate  relative  frequencies,  calculated  across  Ss  within 
each  procedure  and  expressed  as  percentages,  are  presented  in  Tables  1,  2,  and  3, 


along  with  the  number  of  distributions  used  for  each  calculation  and  95%  cred- 
ible intervals  on  those  percentages.  The  credible  Intervals  were  calculated 
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using  an  algorithm  suggested  by  Jackson  (1974)  for  finding  highest  density 
regions  in  beta  distributions  and  assuming  a uniform  prior  distribution  over 
relative  frequency.  These  percentages  can  be  compared  with  the  expected  per- 
centages; 2%  for  Table  1 and  50%  for  Tables  2 and  3. 

In  interpreting  Tables  1 and  2,  it  helps  to  remember  that  an  excessively 
peaked  subjective  probability  distribution  will  produce  too  many  surprises  and 
too  few  true  values  within  the  interquartile  range,  while  an  excessively  flat 
distribution  will  do  the  opposite.  The  relative  frequencies  were  calculated 
separately  for  questions  with  percentage  variables  and  absolute  number  vari- 
ables to  permit  easier  comparison  with  past  results  that  used  only  percentage 
variables.  In  addition,  this  breakdown  facilitated  comparisons  between  the 
tractile  methods  and  the  methods  requiring  uncertainty  measures  as  responses. 
The  results  of  the  percentage  questions  are  probably  most  closely  linked  to  the 
purposes  of  the  experiment,  since  no  additional  Information  was  given  the  Ss  on 
these  questions.  But  the  Tables  show  general  similarity  between  the  two  kinds 
of  results. 

The  results  in  Tables  1 and  2 can  be  interpreted  in  terms  of  the  tight- 
ness of  the  assessed  subjective  distributions.  The  most  striking  result  was 
the  difference  between  the  relative  frequency  of  surprises  in  procedures  re- 


quiring uncertainty  measures  as  responses  and  procedures  requiring  fractiles  as 
responses.  Except  for  the  L0G000S  procedure  the  former  methods  produced  a much 
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Columns  headed  "Credible  Intervals"  are  the  lower  and  upper  bounds  of  the 
95%  credible  intervals  on  corresponding  percentages  of  surprises,  based  on 
all  data  (including  interpolations). 
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smaller  relative  frequency  of  surprises,  indicating  flatter  distributions.  The 
difference  was  in  the  direction  suggested  by  the  anchoring  and  adjustment  pro- 
cess, but  the  distributions  assessed  by  the  ODDS  and  PROB  methods  were  not  too 
flat;  they  were  about  right,  though  not  quite  flat  enough.  The  use  of  interpo- 
lation does  not  seem  to  change  the  results  qualitatively.  If  all  distributions 
for  which  interpolation  was  required  were  assumed  to  be  surprises,  a quite  un- 
reasonable assumption,  the  relative  frequency  of  surprises  would  be  24.7%  and 
14.5%  for  the  ODDS  and  PROB  procedures  respectively,  still  at  or  below  the  sur- 
prise frequencies  of  the  fractile  procedures. 

The  relative  frequency  of  true  values  within  the  interquartile  range  (a 
more  stable  and  more  important  measure  than  surprises  for  most  purposes)  shows 
generally  too  peaked  distributions  except  for  the  PROB  procedure  and  the  PROB- 
FRAC  procedure  on  percentage  variables.  But  all  values  are  reasonably  close 
to  50%  except  for  the  LOGODDS  group,  which  is  absurdly  peaked.  Table  2 shows 
an  interaction:  the  use  of  odds  in  the  fractile  assessment  procedures  produced 
tighter  distributions  than  the  use  of  probabilities,  while  for  procedures  re- 
quiring uncertainty  measures  as  responses,  the  converse  is  true.  However,  this 
conclusion  would  not  hold  up  as  convincingly  (though  it  would  probably  be  sta- 
tistically significant)  if  the  exceedingly  peaked  LOGODDS  group  were  omitted 
from  the  analysis.  This  peculiarity  of  the  LOGODDS  group  may  be  artifactual; 
the  line  of  thought  leading  to  that  conclusion  is  discussed  below. 

The  finding  that  too  few  true  values  fall  below  the  assessed  medians  im- 
plies that  the  distributions  as  a whole  are  shifted  along  the  x axis  to  the 
left  of  where  they  should  be,  i.e.,  give  more  probability  than  they  should  to 
low  values  and  less  than  they  should  to  high  values. 

In  interpreting  the  credible  intervals  in  Tables  1,  2,  and  3,  the  assump- 
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II 

tlon  of  a uniform  prior  may  be  questioned.  In  many  cases,  e.g.,  the  surprise 
frequencies,  the  interquartile  range  frequencies  of  the  LOGODDS  group,  and  some 
of  the  frequencies  below  the  assessed  medians,  the  data  are  striking  enough  so 
the  prior  Is  of  little  Importance.  For  the  other  interquartile  range  frequen- 
cies and  frequencies  below  the  assessed  medians,  a more  peaked  prior  with  a 
mean  of  .50  could  cause  50*  to  be  Included  In  credible  Intervals  in  which  it  is 
not  included  using  a uniform  prior.  Thus  the  credible  intervals  with  endpoints 
near  the  expected  relative  frequencies  should  be  Interpreted  with  caution. 

As  a further  means  of  comparing  the  various  assessment  procedures,  a 
proper  scoring  rule  was  applied  to  the  assessed  distributions.  The  scoring 
rule  used  was  the  continuous  form  of  the  ranked  probability  score  (Epstein, 

1969;  Murphy,  1969)  developed  by  a limiting  process  suggested  by  Brown  (1970). 

Matheson  and  Winkler  (1974)  have  illustrated  the  continuous  ranked  probability 
score  as  one  of  a class  of  scoring  rules  that  they  proved  to  be  strictly  prop- 
er. To  apply  the  scoring  rule  all  absolute  variables  were  linearly  transformed 
onto  the  zero  to  one  interval  to  make  the  scores  of  all  distributions  comp: 
able,  by  setting  the  largest  value  given  by  any  S equal  to  one  and  the  smallest 
value  equal  to  zero  unless  there  was  a natural  zero.  The  scoring  rule  then 

took  the  form 

S *R2(x)dx  ^[l-R(x)]  dx 

where  t is  the  (transformed)  true  value  and  R(x)  is  the  cumulative  probability 
distribution  of  x.  ft  piecewise  linear  approximation  was  "Sed  for  R(x)  between 
assessed  values.  There  was  no  theoretical  justification  for  this  approximation, 
but  because  of  the  known  Insensitivity  of  scoring  rules  (von  Wnterfeldt  and 
Edwards.  1973).  It  probably  had  little  effect  on  the  results. 

The  mean  scores,  presented  In  Table  4,  were  consistent  with  previous 
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analyses  in  that  the  ODDS  and  PROB  procedures  had  better  (lower)  scores.  This 
was  expected  since  these  procedures  did  not  produce  distributions  that  were  as 
much  peaked  as  those  produced  by  the  other  procedures,  and  also  produced  slight- 
ly less  median  displacement.  It  is  interesting  that  the  L060DDS  procedure,  in 
spite  of  its  relatively  poor  showing  in  Tables  1-3,  was  still  preferable  to 
either  fractile  procedure,  according  to  the  scoring  rule.  We  believe  this  is 
because  the  scoring  rule  rewards  probabilities  that  are  extreme  as  well  as  close 
to  the  expected  relative  frequencies.  (See  criterion  5 above.)  Apparently  the 
extremeness  of  the  LOGODDS  procedure  compensated  for  its  poor  showing  compared 
with  expected  relative  frequencies  as  evaluated  by  the  scoring  rule. 

Discussion 

Tie  use  of  fractile  methods  to  assess  subjective  probability  distributions 
in  this  study  led  to  the  same  excessive  number  of  surprises  found  in  previous 
studies  (Alpert  and  Raiffa,  1969;  Brown,  1973;  Schaefer  and  Borcherding,  1973). 
Although  training  seems  to  improve  the  results,  it  appears  other  methods  of  as- 
sessment are  needed.  The  ODDS  and  PROB  procedures  used  in  this  study  seem  to 
provide  more  valid  results.  The  relative  frequency  of  true  values  in  the  tails 
of  the  distributions  was  much  smaller  for  these  procedures.  The  LOGODDS  proce- 
dure produces  little  if  any  improvement  over  the  fractiles  procedures.  In  fact, 
it  may  be  worse.  In  this  study  as  in  previous  studies,  odds  assessed  on  a loga- 
rithmic scale  seem  to  produce  larger  odds  than  verbal  odds  (see  Goodman,  1973). 
Whether  the  larger  odds  are  more  valid  depends  on  the  task  and  task  parameters. 
In  this  study  they  were  not. 
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Although  different  assessment  procedures  yielded  distributions  that  dif- 
fered greatly  in  the  relative  frequency  of  surprises,  the  relative  frequencies 
of  true  values  falling  within  the  interquartile  range  did  not  differ  substan- 
tially, except  for  the  LOGODDS  procedure.  In  this  part  of  the  distributions 
the  relative  frequencies  were  near  what  they  should  have  been  suggesting  that 
in  the  middle  range  of  the  variables,  subjective  probability  distributions  are 
quite  valid,  independent  of  the  assessment  technique.  In  practical  situations 
this  is  often  the  range  of  primary  concern.  What  biases  do  exist  may  possibly 
be  eliminated  by  combining  the  use  of  odds  and  probabilities  in  the  assessment 
process,  since  the  two  measures  pf  uncertainty  seem  to  lead  to  opposite  biases. 

A more  serious  problem  is  the  degree  of  the  median  displacement.  The 
underestimation  of  both  percentage  and  absolute  number  variables  is  not  entire- 
ly consistent  with  previous  findings.  Typically  low  percentages  have  been  over- 
estimated while  high  percentages  have  been  underestimated  (Alpert  and  Raiffa, 
1969;  Schaefer  and  Borcherding,  1973),  which  was  not  the  case  in  this  study. 

No  consistent  median  displacement  pattern  has  been  found  on  absolute  number  var- 
iables; Brown  (1973)  found  overestimation  and  Alpert  and  Raiffa  (1969)  found 
underestimation.  It  appears  that  a thorough  Investigation  of  what  types  of 
questions  lead  to  which  median  biases  is  needed. 

The  findings  of  this  study  seem  to  be  generally  consistent  with  the  an- 
choring and  adjustment  process  hypothesized  by  Tversky  and  Kahneman  (1973).  In 
particular,  the  difference  in  the  number  of  surprises  between  the  ODDS  and  PROB 
procedures  and  the  ODDSFRAC  and  PROBFRAC  procedures  was  in  the  direction^ug- 
gested  by  that  hypothesis.  The  distributions  assessed  by  the  former  procedures 
were  flatter  than  those  assessed  by  the  latter  procedures,  but  ODDS  and  PROB 

distributions  were  not  too  flat,  suggesting  that  some  other  process  is  working 
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In  addition  to  the  anchoring  and  adjustment  process.  Perhaps  there  is  a real 
tendency  to  overestimate  knowledge  (leading  to  too  tight  distributions)  in  ad- 
dition to  the  anchoring  and  adjustment  process. 

The  relative  frequency  of  true  values  falling  within  the  interquartile 
ranges  seems  to  tell  another  story.  The  relative  tightness  of  these  ranges 

i 

showed  an  interaction  between  whether  odds  or  probabilities  were  used  as  the 
measure  of  uncertainty  and  the  type  of  response  required.  This  suggests  that 
if  the  judgments  are  made  by  anchoring  and  adjusting*  quantitatively  different 
adjustment  processes  were  occurring  for  odds  and  probabilities.  Apparently  in 
the  fractile  procedures  a larger  adjustment  in  the  value  was  needed  to  go  from 
1:1  odds  to  3:1  odds  than  was  needed  to  go  from  a probability  of  .50  to  .75. 
Correspondingly  a smaller  adjustment  in  odds  than  probability  was  needed  to 
adjust  to  some  fixed  value.  Again  it  appears  that  the  hypothesis  of  an  anchor- 
ing and  adjustment  process  cannot  completely  explain  the  results.  Although 
this  process  does  seem  to  play  a role  In  the  judgments  required  in  this  task, 

more  complex  processes  were  also  occurring. 

Obviously  this  type  of  sterile  laboratory  experiment  cannot  provide  the 
ultimate  answer  to  the  question  of  which  method  of  probability  assessment  is 
best  for  real  world  decision  problems.  What  it  can  provide  is  evidence  about 
the  biases  involved  in  various  assessment  procedures.  The  better  these  biases 
are  understood,  the  better  they  can  be  counteracted.  The  practical  solution 
will  usually  include  a combination  of  various  procedures  incorporating  many 
consistency  checks  (Spetzler  and  Steel  von  Holstein,  1972).  Such  processes 
can  utilize  the  best  aspects  of  each  procedure  while  allowing  probable  biases 
to  be  explained  and  perhaps  reduced  or  even  eliminated. 
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